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Abstract — In this paper, we introduce a distributed dynamic 
routing algorithm for secondary users (SUs) to minimize their 
interference with the primary users (PUs) in multi-hop cognitive 
radio (CR) networks. We use the medial axis with a relaxation 
factor as a reference path which is contingent on the states of 
the PUs. Along the axis, we construct a hierarchical structure 
for multiple sources to reach cognitive pilot channel (CPC) base 
stations. We use a temporal and spatial dynamic non-cooperative 
game to model the interactions among SUs as well as their 
influences from PUs in the multi-hop structure of the network. 
A multi-stage fictitious play learning is used for distributed 
routing in multi-hop CR networks. We obtain a set of mixed 
(behavioral) Nash equilibrium strategies of the dynamic game 
in closed form by backward induction. The proposed algorithm 
minimizes the overall interference and the average packet delay 
along the routing path from SU nodes to CPC base stations in 
an optimal and distributed manner. 

I. Introduction 

The primary users (PUs) directly affect the spectrum op- 
portunities available for the secondary users (SUs). As a 
consequence, it results in the time-varying wireless channel 
conditions and the dynamic network topology in multi-hop 
cognitive radio (CR) networks |[TJ — (3J . Recently, the cognitive 
pilot channel (CPC) has been suggested to provide frequency 
and geographical information to SUs, assisting them in sensing 
and accessing the spectrum |4]-(6]. As a result, the SUs 
can improve their performance and avoid scanning the entire 
spectrum to identify the spectrum holes and available PUs. The 
on-demand CPC transmits the information only at the request 
of a SU terminal. The on-demand CPC including both an 
uplink and a downlink channel enables a wider range of CPC- 
based applications in addition to the retrieval of the information 
about operators, radio access technologies, and frequency lists. 
To establish an effective CPC network, a multi-hop disnibuted 
CR network scheme is required |3), J6)- The main existing 
work in this area has focused on the information contents that 
CPCs can convey, as well as on the implementation aspects 
of the channels. To the best of our knowledge, no work has 
been done to investigate an optimal dynamic routing scheme 
for the on-demand CPC to route requests and deliver the CPC 
information. 
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Due to the dynamic interception of frequency channels by 
PUs, SU networks should dynamically update their routing 
paths. Although a few disnibuted routing algorithms have been 
suggested for multi-hop CR networks Q, (3), it is imperative 
to study an effective, dynamic and intelligent routing scheme in 
CR networks that considers the time-varying channel states and 
minimizes the interference not only over the routing path but 
also over a long time horizon. In J9), a network formation game 
algorithm has been studied for multi-hop CDMA networks in 
which wireless users attempt to connect to the base station 
via other users in the cellular network. The routing scheme is 
based on a spatial dynamic game in which each user optimizes 
the multi-stage interference along the path. In this paper, we 
consider a similar dynamic routing game framework for the 
application to multi-hop CR networks. Given the time-varying 
nature of the PUs, we investigate a spatially and temporally 
dynamic game framework which takes into account the state 
variation of the PUs as well as the multi-stage property of the 
CR networks. 

The main contribution of this paper is to provide a distributed 
and optimal dynamic multi-hop network routing scheme for 
the on-demand CPCs. We consider a CR network comprised 
of PUs and SUs. The SUs form multi-hop hierarchal levels to 
the CPC base stations and their performance is based on the 
location of PUs and their states. In this work, we use thresholds 
to separate the SUs logically into different layers and allow SUs 
at each level to play a game against other users by choosing the 
optimal route. In simulations, we observe that with the presence 
of a PU, the SUs deviate from their original routes to avoid 
collisions with the PU. The proposed algorithm minimizes the 
interference with PUs and packet delay along the routing path. 

The rest of the paper is organized as follows. Section II 
presents the system model, and the game-theoretical model for 
the CPC network routing is described in Section III. In Section 
IV, we analyze the dynamic game and characterize the mixed 
Nash equilibrium in a recursive form. In addition, we devise an 
algorithm based on the fictitious play learning for the dynamic 
multi-hop network routing game. The simulation results are 
described and analyzed in Section V. Finally, conclusions are 
drawn in Section VI. 

II. System Model 

In CR networks with the presence of PUs who constantly in- 
tercept frequency channels, the multi-hop routing of SUs needs 
to be made dynamic, distributed and efficient to reduce their 
total interference with the PUs. We assume that SU nodes are 
capable of acquiring the knowledge of the channel conditions 
and their neighboring relays. SUs dynamically form routing 
patterns and allocate transmission channels in the spectrum 
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Fig. 1. A snapshot of the proposed dynamic multi-hop network formation for 
the on-demand CPC on a frequency channel mi. 

holes of the CR networks. Unlike multi-channel multi-radio 
schemes, which operate on one channel at a time, multi-hop CR 
networks can switch frequency channels on a per-packet basis. 
In this paper, we consider the scenario in which the network 
relays are CR SU nodes and the data is relayed from source 
nodes to CPC base stations. We assume that the data format of 
the on-demand CPC channel is determined, and consequently, 
the bandwidth of a frequency channel is fixed for every SUs. 

Let Q = (A/ - , £) be a topology graph for a multi-hop CR 
network, where TV = {ni, . . . , n^} is a set of N SU nodes 
including the source SU nodes and relay SU nodes; and £ = 
{ei, . . . , eg} is a set of E links connecting the SUs and the relay 
nodes. In addition, we let M. = {mi, . . . , m#- } be a set of K 
PUs and K, be the set of CPC channels. We assume that the 
set of frequency channels is identical to the set of PUs. In the 
network routing problem, we assume that the set N is known, 
but we need to determine a set of E links that optimizes the 
network utilities (to be discussed in Section III). A SU node, 
Hi G AT, establishes a link with its neighboring nodes. Each 
PU mk G M. is associated with a channel, which can be in 
either an occupied or an unoccupied state. Let Sk,nik G -A4, 
be the set of channel states of PU m^. A system state s = 
{sk]m k eM G 5 := IlfeLi "Sfc i s a collection of individual states 
of each primary channel mk- 

We consider that the CR network with SUs and relay nodes 
can be separated into a hierarchical structure [91 using the 
medial axis iflOl as a reference path as shown in Fig. Q] 
Unlike ifTOl . particular physical channel conditions, such as 
delay constraints and spatial PUs' footprints, are required for 
multi-hop CR networks. A SU has to connect to a CPC 
base station to request the CPC information through CR relay 
nodes in multi-hop CR networks. Even though the SUs can 
be geographically distributed over the network, we can view 
the source nodes as nodes residing at the initial level 1. We 
let C(s) = {1,2,- •■ ,L(s)} be a set of hierarchical levels 
at state s, comprising a total of L(s) levels with the first 
layer consisting of the source SUs and the terminal layer 
L(s) consisting of nodes that directly connect to CPC base 
stations. We define the medial axis between two PUs as the 
curve that describes the geographical points associated with 
SUs where the lowest power from PUs is perceived. We assume 
that each SU node is capable of sensing PUs and learning their 
footprints. The information of the footprint of PUs is supposed 



to be delivered to all SU nodes by flooding at the initialization 
of the SU network. SU nodes dynamically learn the pattern of 
the PUs' footprint. Denote by A the medial axis and let Ncf be 
the set of nodes on the medial axis A. We use the medial axis 
A as the reference path along which the relay nodes between 
source nodes and CPC base stations are separated into L(s) 
hierarchical levels. We suggest a relaxation factor, u = "&/r, 
where r is the radius of the canonical circle around the axis A 
and the parameter i? is chosen between and 1. The relaxation 
factor allows us to increase the area around medial axis and 
consequently enlarge the area of SU nodes deployed in the 
network. Similarly, we use A/J to denote set of nodes in the 
relaxed area. 

In an on-demand CPC scheme, each CPC base station can 
convey the statistics on the availability of idle frequency chan- 
nels to the SU nodes to assist them in updating their spectrum 
knowledge. SU nodes access the nearest CPC base station to 
request the information. In our framework, we include the set 
of SUs A/J in the set M, among which we determine the set 
of links £ to connect SUs to CPCs. The remaining set A/5 \A/" 
is the set of relay nodes along the medial axis that are capable 
of transmitting packets in a multi-hop fashion. 

III. Game Theory Model of Multi-hop Network 

In this section, we describe a stochastic multi-stage network 
routing game defined by S = {^h(s)} S £S,heC(s)> which is a 
set of games indexed by state s and hierarchy level h. Suppose 
the network maintains the same hierarchy structure at each 
state, then S can be viewed as a matrix of games whose row 
indicates a spatial network routing game S(s) at a particular 
state s and whose column is a temporal or state collection 
of games at the same level h if at each state the network 
has the same hierarchical separation. The network formation 
game H(s) at a given state s is well defined by a sequence of 
games {E h (s)} h=h ... iL(s ), where E h (s) is a game at level h 
and state s. Each SU can access an idle PU's channel only 
when it is free. Let : Sk x Sk —> [0, 1] be the state 
transition law of PU channel mk on Sk- Since the transition 
probabilities between states x, y G S are only controlled by 
the PUs, the stationary distribution tt = [7T s ] s gs of the Markov 
chain (<Sfc,Pfc), which we assume to exist, is independent of 
the actions of the SUs. Let Hi, I € £(s), be the set of SUs that 
belong to the level Z. It is clear that the sets Hi are mutually 
exclusive and Ui e c(s)Hi = TV. 

Denote by li G C(s) the level where user m resides. To 
find a multi-hop connection to the CPC channel, user n, needs 
to find a node to connect to at the next level li + 1. In this 
paper, the chosen node (n^i, + 1) can potentially yield an 
optimal path with minimum payoff leading to a connection to 
the destination. Let the set ATi(s) denote the nodes that are 
available for connection at the next level to a node at level I 
at a particular state s. If U = L(s), we let A/j 4 = 1C. Hence, 
(riiji + 1) is a node chosen by rii at the next level from 
the action set A/j^s) available to node (n.iji). By default, we 
define (n.j, h) := ri;. 

The local stage payoff to user rii is Ui(s,(rii,li + 
1), (n_j, h+1)) : 5x7V;. h — >• M. It is an instantaneous payoff 



at the local stage l - L which depends on the local actions of all 
users at the level U. As a convention, we use n_, := T-Lii\{rii] 
to denote the set of users other than n, at the level k and 
(n-i,k + 1) := {(nj,k + 1) : rij e r H.i\{n i \} to mean the 
set of actions by the set of users n_,;. The coupling among 
the utility functions induces a noncooperative environment 
in which user m competes with other user n_; to achieve 
optimal utilities. 

In the multi-hop CR networks, spectrum occupancy is 
location-dependent, and therefore, along a multi-hop path, 
available spectrum channels may be different at each relay node 
0. If the surrounding PUs are highly active, their availability 
for communications duration becomes meager for SU nodes, 
resulting in long routing delay in CR networks. Hence, we 
consider the queueing delay in the payoff function of the 
dynamic game. If a PU frequently intercepts the channels, 
the channels under current use need to be switched to other 
unoccupied channels and this switching time is added to the 
processing delay. In addition, if PUs intensively occupy all 
channels for a long time, SU nodes have no idle frequency 
channels and, consequently, re-routing delay is increased and 
results in more packet delay. The expected total packet delay tj 
perceived at SU node n, is defined by the Pollaczek-Khinchin 
formula for the M/G/l queueing system as follows IfTTI : 

n(s, {rii,li), (n-i,k)) 

\ {mM) (s)x5 (niM) ( s ) (1) 

2(l-p( S ,(n l ,/ l ),(n_ 4 ,/ l ))) + M> W ' 

where A(„ il j i )(s) is the state-dependent arrival rate of pack- 
ets from nodes at level U seen at the node chosen by 

m and p(s,(ni,li),(n-i,li)) = A(„ iiii )(s)//i (n . iJ .)(s) = 

^{n iy ii)(s)X (ni,li)( s )< where jU( n< ,ij)( s ) * s tne serv i ce time at 
the node (rii,li). Xr ni ; .) is the mean service time per packet 
at the chosen node and X 2 ( n .j.}(s) is the expected variance 
of Xr ni j f y The coupling between transmitting nodes at level 
li is evident from ([T). When more nodes choose to connect to 
the same node {n.^U) as node rii does, they will experience 
more delay as the arrival rate X( ni! iA increases. 

Each node m calculates its payoff W,(s, (rii, li), (n_j, k)) of 
connecting to (rii,li) given by 

Ui(s, (rii, k), U)) = n(s, (rii, k), (n_i, k)). (2) 

The degree of freedom of user m's choice is constrained by the 
set Mi ( ■ Given a state s, a user aims to optimize his long-term 
payoff along his path to a CPC base station rather than merely 
to optimize his local myopic payoff Uj at level ij. Denote by 
V(rii,l),li < I < L(s), the path of a node rii to a node at 
level I. It is clear that when I = L(s), V(rii,l) refers to the 
path from a SU to the destination and, by default, when I = 
li + 1, V(rii,li + 1) is a link from m to the node (rii,li + 
1). For example, the path V(rii,li + 2) is composed of two 
links: V(rii, k + 1) and V((rii, k + 1), U + 2). The user m can 
only have control over the link V(rii,li + 1) while the link 
V((rii, li + 1), li + 2) and the links onwards are controlled by 
other nodes. 

Let U t (s, (rii, I), (ri- t ,l)) : S x U ll£{hh+lr .. ,1} •A^' 1 ~> 
R denote the path utility attained by the user n, at state s over 



the path Virii, I). Due to the delay, interference and coupling, 
the utility also depends on the path by other users. Following 
the convention, we use (rii, I) '■= {((n>i,l'),l ! + 1) : k < I' < 
I — 1} to denote the path of user rii U P t° the level / and use 
(n_ is := I'), /' + 1) : nj £ H h \{mh k<l'<l- 1} 

to denote the actions by other users up to the level I along 
their chosen path V{ri-i, l). The goal of a SU rii is t° attain a 
connection to a CPC base station with optimal utility along the 
path. The path utility U~i(s, (rii, L(s)), (n_i, L(s))) of a user 
rii over the P a th V(rii,L(s)) can be expressed as a sum of 
stage utilities as follows: 

Ui(s, (rii, L(s)), (n-i, L(s))) 

where and ur ni n(s, (rii, £ + (^-ii ' + !)) denotes the payoff 
to node (rii, when it chooses directly the node (rii,l + 1) in 
the next level. 

At a fixed state s, a SU rij needs to choose an optimal path 
that achieves the optimal payoff. However, he can only choose 
from Mi t (s) his next hop connection (rii, h + 1) and leaves the 
future choices made by the node that he connects to. Let U*(s) 
be the optimal payoff. The path utility (O can be rewritten as 

U*(s,(n-i,L(s)) 

L(s) 

■ = , m s in r , s 52 u {ni.i)(s, (rii, I + 1), (n-i,l + 1)) 
(n j ,i i +l)eM i (s) ^ 

= min Ui(s,(rii,li + l),(ri-i,li + l)) 

(«.,l.+l)£M,(s) 
L(s) 

= min Uj(s, (nj,Z, + 1), (n_j,Zi + 1)) 

+ U (n it l+l) ( s > ("-(n s ,i+l),i(s))) 

= min Ui(s, (m,l+ 1)) + [/ ( * i+i) (s, (n ( „ iji+ i),i(s))). 

On the right-hand side of ©, J7* n can be seen as 
the payoff-to-go and m is the current instantaneous payoff 
to optimize. A solution to <j4j can be found using backward 
induction, where we start with the nodes at the last level and 
then propagate the solution to level li. Since every user on 
the same level with m optimizes his utility in the same way, 
the best responses of © to other nodes on the same level 
can lead to a Nash equilibrium where no node finds it to 
its benefit to deviate from its chosen action. However, the 
existence of Nash equilibrium for the game at a particular 
level is not guaranteed. To ensure the existence, we adopt 
mixed strategies in which the users at level ij randomize 
over Afi^s). Let fi,;(s) = {f( ni ,z') : h < I' < 1} denote 
a mixed strategy of rii at a given state s up to level I. It 
is clear that when I = li, then fi,;(s) only contains the 
mixed strategies of node rii. For / > li, fi,i(s) comprises of 
sets of mixed strategies along the path up to (rii, I). We let 
Fi := {iij'(s), rii £ 'Hi i ,k < I' < 1} the set of user strategies 



at level I and F_ 4i ; := {fj,i'(s),rij G 'Hi i \{iii} , li < I' < I}. Fubini's Theorem fl2| . Assume L(s) = L for all s € S. We 

Let F := {f u (s), n % G Hi,' I G C(s)}. can rewrite © as 

A user m chooses a mixed strategy to minimize his expected Vi(s, fi, ti (s),F_j,i. (s)) 
total payoff of (0), i.e., 



L( S ) 

Ui(s,F) = ^Ef. jjiP _. ! U(„. j ;)(s, (ra i; Z + 1), (n_j,Z + 1)). 

Z=Z; 

(5) S 

Following (O, we have Ji 



^^/3 t E s ,f iji ( s) , F _ i i ( 3 )[it(„ i , i )(s, (ni,/ + 1), (n_i,Z + l))|so 

I=!i t=0 



^ZP t ^s,f Zrl .(B),F_ Zrli (s)[ui(s, (m,k + 1), (n-i,li + l))\s = s] 



Ui (s,f Mi F_ Mi ) 



+ 5Z^ tEs > f -.i( s )' F -«,i( s )[ u ("i- ! )( s '( ni ' z + 1 )'( n - i '' + 1 ))i' 

+ l t=0 



:= minEf. |j)F _. ^tt^s, (rii,Zi + 1), + 1)) ^ Denote by Wj(s, fj(s), F_ ii ; i (s)) the local infinite-horizon 



utility function, i.e., 

Wi(s,¥ itk (s)), F_ Mi (s)) 



Similarly, the game with expected payoff can also be solved 

by backward induction. At each state, a matrix game needs to = ^ E s,f ii , i (s),p_ i ,, i (s) K(s, (n*,Zj+i), (n_j, Z,+i))[s = s 

be solved and each user rij generates a mixed strategy on their * _0 

action set. Hence, what we will be encountering here is a Nash Hence, dS) can be rewritten as 

equilibrium in behavioral strategies. f . ^ (fl)> F _. ^ (s)) = ^ f . ^ (fl)> F _. ^ (s)) 

IV. Dynamic Multi-hop Routing , w /. f /„\ v f.\\ 



A user n, faces a long-term payoff when the cognitive 



i=i i+i 



radio system evolves on the state space S. Let {s t ,t G Z} The payoff function Vi has two components. One is the local 
be a sequence of states indexed by time t. Let Vi(s) be infinite time horizon payoff and the other is the off-to-go 
the value function of user m when m starts in state s, i.e., infinite time horizon payoff. Both components depend on the 
so = s. We consider mixed (behavioral) stationary strategies strategy made by user rii. 
for user rii that are only dependent on their current state. 

Denote by Ms,a t ) the probability of user n t choosing a A Nash Equilibrium and Backward Induction 

next level node a z € A/j 4 at state s G S. The vector We intend to find the Nash equilibrium of the game defined 

f;(s) S [0, L] |7V " J i 1 is the state dependent mixed strategy given by H with the utility function in ©. 

by fj(s) = [fi(s,ai)] aie tf lt . Let Ji(s) denote the set of all Definition 1: Let $; be the level I game denoted by $; := 
such strategies, i.e., (MA V ii n i e M}, G A/j},P,<S), where V* is the 

utility defined in © and .Aj = .4; = M = Hi+i,Vn t G Hi 
is the action space for users m G A/J. The mixed stationary 

^ / i («,o i ) = l^. (7) 

strategy F* is a Nash equilibrium if it satisfies Vn, G A/^ , 



7i(«) = U(s) G [0,1] 



Oj GJV; . 



K(s,F* (*)) > ^(.s,f 4 ( S ),F*_ Mi ( S )),Vf^( S ) G ^(s) 



Let V{ be the long-term infinite horizon path utility function 

which depends on f t (s),n t G Hi,; it is given by The sequence of {$J; e£ admits a mixed stationary Nash 



Vi(s,fi(s),f-i(s)) 



strategy F* if for all / G C, F* is a mixed stationary Nash 
strategy for the game 

The value function V{ is the payoff at the Nash equilibrium, 

= fSXf. W f . M [U i (s,(n i ,L(s)),(n- i ,L(s)))\s = s] Le " Vi & = ^fc^ (s) KS sholt - hand n ° r tati ° n ' " 6 

^ A( ) ' M H 11 1 J f* G arg NE i {y i («,Pi 4 («))}, = NEi{V,(«, F u («))}, 

(8) to denote the Nash equilibrium and its corresponding value 
oo L( a ) function respectively. Using induction, we have the following 

= Y^Y1 ^s.f.tsJ.f-iWN^oO' K,/ + 1), l))|s = s], result. 

t=o !=!» Theorem 1: Let r^i^s) = NEi{m(s, (rii, k + 1), (n-i,h + 

, - „ , . ,. -j- ™, , , l))},rij G and r; r be its vector form with each entry 

where < p < 1 is a discounting factor. The value function is "' ,. ! ' , , , 

... , , ... T . . corresponding to one state. Also let P = \P S s'\s s'ps be the 

yielded by the optimization over the long term utility K given .. . , , r 1 , ' . 

. . , . , . T „* / \ transition matrix. At the last stage, L = L, we obtain 
the other users mixed stationary strategies il.j(s), 

fi,i»(s) G arg NEJm^s, + 1), + 1))} 

"i 8 = .n^J/i « « >Vrii e (9) _ f api-i,. 

fie^(s) v, — [i — pf\ r itL , 



Since the utilities u 4 are bounded, Vi is bounded and where r ljL (s) = NE^u^s, L), L))}. If /C = {ifo} 
the interchange of expectation and summation is possible by is a singleton set, then J"j i(s) = Ui(s, Kq, ■ ■ ■ ,Kq). The 



TABLE I 

The skeleton of the proposed dynamic routing algorithm 

1: Global procedure 

Initialize network 
Define dR, A, u, Afg A 
Update map 
2: Local procedure 
Start game S(s) 

SUs learn the mixed strategies by fictitious play at each level 
End game E(s) 
3: Packet transmission procedure 

4: Complete forwarding the request of CPC information 

5: Acquire information from CPC 

6: Go to update map if new information is obtained 



mixed stationary strategies F* of the game S can be found 
recursively by 

f* ( .(s) 6 arg NE^Uiis^nuk + ^^n-iJi + l)) 

+V(r H ,l i +l)( S )}' n i £ 

The optimal payoff to a node m playing f*j. (s) yields a value 
function 

v i = [I-/3-p}- 1 r iM ,Vn i eH h , (12) 

where 

{NE i {w i (s, (m, k + 1), (n-i, h + 1)) 
+W(n 4 ,J 4 +X)(s)} ifl<k<L-l, 
NE 4 {u 4 (s, (m, h + 1), {n-i, h + 1))} if k = L. 

The recursion involves finding th ( and f*(s) through the value 
functions of nodes at the next level. 

B. Algorithm Description 

We propose a distributed dynamic routing algorithm for 
the SUs to minimize their interference with the PUs. Each 
node improves its current payoffs and takes into account the 
previous payoffs. Summarized in Table I, the algorithm starts 
with a global procedure defining the set of hierarchies from 
the sources to the destinations. In this step, we initialize the 
boundary, PU footprint map, and the medial axis and they are 
dynamically updated whenever the SUs acquire the new system 
knowledge from CPCs. Once the source to the destination pair 
is defined, the set of A/g along the medial axis is defined 
as a reference routing path with the relaxation factor ui. The 
second step of the algorithm is a local procedure in which each 
user calculates his payoff and selects the best routing node at 
the next level with the minimum delay. The node updates its 
mixed strategies by fictitious play 1131 until the game converges 
to its Nash equilibrium. We use fictitious play at each stage 
I G C to find the mixed Nash equilibrium at that level. After 
SUs acquire the state information from CPCs, the nodes update 
map including the PUs' map and topology information and the 
set of nodes J\fg A and Ng A . 

V. Simulation Results 

Fig. |2] shows the simulation results of a routing formation 
from 4 source SU nodes to the nearest CPC base station within 
Mg A . In Fig. 12 we show two PUs, two CPC base stations, 
4 source SUs, and 10 SU relay nodes are in Mg A within a 



2 km x 2 km area. We assume that the PUs do not vary 
their configurations or footprints. The SU source nodes are 1, 
2, 3, and 4 at the first hierarchy level. SU relay nodes have 
their own random arrival time and service time. The arrival 
time depends on routing results at previous hierarchy levels 
due to the dynamic utility functions. The nodes update their 
mixed strategies at each iteration. The optimal action taken 
at the iteration k determines the vector Ui at the following 
iteration and is used to update the empirical frequencies. The 
SUs form the current belief of a user on other users' actions 
from the empirical learning and select the best response action 
in the next iteration. This iterative process continues until the 
empirical frequencies converge to the Nash equilibrium. 

Fig.[2]depicts 4 SU nodes, i.e., 1, 2, 3, and 4, at the first level 
and 4 SU relay nodes, i.e., 5, 6, 7, and 8, at the next level. Fig. 
shows the convergence of the mixed strategies vs. iterations 
at SU node 4 at the first hierarchy level. The node 4 updates its 
mixed strategies by increasing or decreasing the probabilities 
on selecting nodes 5, 6, 7, and 8, as the data of the play history 
accumulates. The mixed strategies finally converge to 0.9, 0.2, 
0.01, and 0.01 for connecting to nodes 5, 6, 7, and 8 within 60 
iterations. Therefore, node 4 selects node 7 to transmit packets 
within a reasonable iteration time. 

The results of the dynamic routing algorithm for densely 
deployed SU networks are shown in Fig|4] There are 3,000 
nodes and relays randomly deployed within a 1.6 km by 2.0 km 
area in Fig. [3] Two PUs exist respectively at (-0.3, 0) with the 
coverage radius of 0.5 km and at (0.6. -0.2) with the coverage 
radius of 0.2 km. The arrival rate and the processing rate for 
all SU nodes are randomly chosen. Fig. [4] shows two pairs of 
routing results from SU 2 to CPC BS 1 and SU1 to CPC BS 2. 
In this simulation, all SU nodes have 1 W transmission power 
with the interference range of 0.15 km and a path loss factor 
a = 2.5. The results of the proposed routing algorithm with 
u) = 0.7 is compared with results of Dijkstra's algorithm and 
medial axis (MA) routing as shown in Fig. [4] The blue lines 
(3) are routes using Dijkstra's algorithm, and the black lines 
(1) represent the routes using our algorithm. The red lines (2) 
show the routes using MA algorithm. Intuitively, the Dijkstra's 
algorithm provides higher interference with PU networks than 
the proposed algorithm along overall routing paths and MA 
routing algorithm provides much higher delay because of traffic 
jam on medial axis path. 

Fig- H] shows the results of interference comparison between 
different algorithms. The x-axis represents the distance be- 
tween the source and destination and y-axis is the normalized 
interference. The interference with PUs is calculated using 
the interference temperature model. The average delay results 
versus distance of routing paths is also shown in Fig. [7] From 
Fig. [6] and Fig. [7] we can find that our proposed algorithm 
avoids congestion and minimizes delay at a cost of a slight 
increase in the interference compared to the MA algorithm. 
Our algorithm also provides robust routing results by hierarchy 
level network routing, which forwards packets to the nearest 
CPC base station. 




Fig. 2. Results of the dynamic routing algo- 
rithm with 4 SU nodes and 10 relays randomly 
deployed in N§ A at t = t . ui = 0.7 and 
random U applied for every nodes. The arrowed 
solid lines are final routing results, i.e. Nash 
equilibrium for each SU nodes. 



Fig. 3. The network topology for 3,000 SU 
nodes and relays (dot) randomly deployed in the 
presence of 2 PUs. 



Fig. 4. Results of routing for the proposed algo- 
rithm in comparison with Dijkstra's algorithm 
and MA routing, lu = 0.7 and random U 
deployed for every SU nodes. 




Fig. 5. Convergence of the mixed strategies vs. 
iteration time of node 4 at the first hierarchy 
level in Fig. [2] 



Fig. 6. The normalized interference results ver- 
sus distance of routing paths for different algo- 
rithms. 



Fig. 7. The normalized delay results versus 
distance of routing paths for comparing with 
different routing algorithms. 



VI. Conclusions 

In this paper, we have introduced a dynamic and distributed 
routing algorithm for multi-hop cognitive radio networks. The 
routing algorithm minimizes the interference with the PUs 
from multiple pairs of SU nodes to CPC base stations. We 
have used the medial axis scheme with the relaxation factor 
as a reference path in the global procedure. In the local 
distributed procedure of the algorithm, we have used fictitious 
play learning to find a mixed-strategy Nash equilibrium of the 
dynamic routing game in the multi-hop CR network. We have 
shown that the equilibrium can be found in closed form by 
backward induction. In simulations, we have observed that the 
mixed strategy of each SU node using the learning algorithm 
converges within short iterations. The algorithm significantly 
reduces interference with PU networks and achieves low packet 
delay from source nodes to CPC base stations. 
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